Applying Word Sketches to Russian

نویسنده

  • Maria Khokhlova
چکیده

The paper describes work on writing a Russian Sketch grammar for the system Sketch Engine. The objective of such a system is to provide lexicographers with sufficient lexical material and tools for getting information about a word’s collocability and to generate lists of the most frequent phrases for a given word, and then to classify them for appropriate syntactic models. The system will give information about a word’s collocability on concrete dependency models, and will generate lists of the most frequent phrases for a given word for various grammatical models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building Russian Word Sketches as Models of Phrases

The paper describes the writing of Sketch Grammar for the Russian language as a part of the Sketch Engine system. The Sketch Engine representing itself a corpus tool which takes as input a corpus of any language and corresponding grammar patterns. The system gives information about a word’s collocability on concrete dependency models, and generates lists of the most frequent phrases for a given...

متن کامل

Studying Word Sketches for Russian

Without any doubt corpora are vital tools for linguistic studies and solution for applied tasks. Although corpora opportunities are very useful, there is a need of another kind of software for further improvement of linguistic research as it is impossible to process huge amount of linguistic data manually. The Sketch Engine representing itself a corpus tool which takes as input a corpus of any ...

متن کامل

Semantic Clustering of Russian Web Search Results: Possibilities and Problems

The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are d...

متن کامل

Word Sketches for Turkish

Word sketches are one-page, automatic, corpus-based summaries of a word’s grammatical and collocational behaviour. In this paper we present word sketches for Turkish. Until now, word sketches have been generated using a purpose-built finite-state grammars. Here, we use an existing dependency parser. We describe the process of collecting a 42 million word corpus, parsing it, and generating word ...

متن کامل

Corpus Analysis for Lexical Database Construction: A Case of Russian and Czech Wordnets

The paper deals with corpus-based methods applied to the particular tasks of lexical database construction. Different techniques of the corpus analysis are discussed and their applicability for the tasks is assessed. Corpus management system Manatee + Bonito developed at the Faculty of Informatics, Masaryk University in Brno, Czech Republic, is presented as a tool that enables to perform all di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009